1) How does each country rank with its population suffering from a specific mental illness?
2) What kind of relationship patterns do we see between the disorders?
3) What is the general trend of education level based on the three main educational levels?
4) General insights into how depression percentages compared between people actively looking for jobs vs job searchers?
5) How do the personal and work-related factors affect an individual's mental health segregated by gender?
6) How do the depression levels change over the years across both, gender and age groups?
The domain of interest of this study is mental health; the focus of this study will be to analyze trends of causes leading to mental health concerns. The problem that will be investigated will be as to what personal and professional factors affect people with mental health concerns based on collection of data from various parts of the world. It will be observed how the employment status and income brackets factors into this situation and how this relates in the corporate work, personal life and the location people are based out of. Moreover, this study will focus on making comparisons into how living in different countries add to this issue. How the measures taken from the past, which could include family history and treatment options taken by an individual, contribute to this. The study will put emphasis on the changes and trends that can be identified from 1990 to 2017, the range of factors associated with it and how these factors affect an individual’s life by being a reason for various mental disorders to develop. Effectively understanding, acknowledging, and going on a path to alleviate these issues is important for an individual’s wellbeing and improving their overall quality of life.
In the scope of our study, we are working with five different datasets around the general theme of mental health. First one includes the pervalance of seven different kinds of Disorders among the population from various countries across the globe from the years 1990-2017. In this dataset the disorder among population is brokendown into percentages. The second dataset includes the prevalance of depression across different age groups from several countries. The size of this dataset is very similar to that of the first one and contains information from year 1990-2017. The third dataset includes the prevalance of depression broken down in to males and females across several countries from year 1990-2017. The source of these datasets is from Institute for Health Metrics and Evaluation (IHME), which is an independent global health research center at the University of Washington. The fourth dataset has data on depression prevalence across several countries, disaggregated by education level and employment status; it has been gathered from Organisation for Economic Co-operation and Development (OECD). This dataset only contains information from year 2014. All of the above dataset are in .xlsx format. The fifth dataset is a Tech Survey csv file containing responses from over 1200 people. The data includes categorical variables such as (Yes, No, Don't know) regarding their employment, family, and mental health concerns. The source of this dataset is from OSMH (formerly OSMI). Open Sourcing Mental Health is a non-profit, 501(c)(3) corporation dedicated to raising awareness, educating, and providing resources to support mental wellness in the tech and open source communities.
The resource available for download on IHME Websites can be used, shared, modified or built upon by non-commercial users in accordance with the IHME FREE-OF-CHARGE NON-COMMERCIAL USER AGREEMENT
The guiding questions that will be addressed using the above datasets in this investigation are:
These questions will be important when researching and analyzing data about mental health. Our insights of these quesiton will provide evidence which can drastically impact public health policies. Furthermore, Corporate companies can also use the analysis to create a better work environment for their employees and improve their employee retention rate. On top of that presenting the insights of our project can assist public health professionals to strategize policies to improve the population's overall mental health. The goal of this study is to create awareness and allow people to realize the importance of their mental well being and to be able to be more productive in their day to day activities.
#Importing neccessary libraries
import pandas as pd
import plotly.express as px
import numpy as np
from plotly import graph_objs as go
import sys
!{sys.executable} -m pip install --user pycountry_convert
!{sys.executable} -m pip install --user openpyxl
import pycountry_convert as pc
Requirement already satisfied: pycountry_convert in c:\users\varda\anaconda3\lib\site-packages (0.7.2) Requirement already satisfied: pycountry>=16.11.27.1 in c:\users\varda\anaconda3\lib\site-packages (from pycountry_convert) (22.3.5) Requirement already satisfied: pytest-mock>=1.6.3 in c:\users\varda\anaconda3\lib\site-packages (from pycountry_convert) (3.10.0) Requirement already satisfied: pytest-cov>=2.5.1 in c:\users\varda\anaconda3\lib\site-packages (from pycountry_convert) (4.0.0) Requirement already satisfied: wheel>=0.30.0 in c:\users\varda\anaconda3\lib\site-packages (from pycountry_convert) (0.37.1) Requirement already satisfied: pprintpp>=0.3.0 in c:\users\varda\anaconda3\lib\site-packages (from pycountry_convert) (0.4.0) Requirement already satisfied: repoze.lru>=0.7 in c:\users\varda\anaconda3\lib\site-packages (from pycountry_convert) (0.7) Requirement already satisfied: pytest>=3.4.0 in c:\users\varda\anaconda3\lib\site-packages (from pycountry_convert) (7.1.1) Requirement already satisfied: setuptools in c:\users\varda\anaconda3\lib\site-packages (from pycountry>=16.11.27.1->pycountry_convert) (61.2.0) Requirement already satisfied: attrs>=19.2.0 in c:\users\varda\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (21.4.0) Requirement already satisfied: iniconfig in c:\users\varda\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (1.1.1) Requirement already satisfied: packaging in c:\users\varda\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (21.3) Requirement already satisfied: pluggy<2.0,>=0.12 in c:\users\varda\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (1.0.0) Requirement already satisfied: py>=1.8.2 in c:\users\varda\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (1.11.0) Requirement already satisfied: tomli>=1.0.0 in c:\users\varda\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (1.2.2) Requirement already satisfied: atomicwrites>=1.0 in c:\users\varda\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (1.4.0) Requirement already satisfied: colorama in c:\users\varda\anaconda3\lib\site-packages (from pytest>=3.4.0->pycountry_convert) (0.4.4) Requirement already satisfied: coverage[toml]>=5.2.1 in c:\users\varda\anaconda3\lib\site-packages (from pytest-cov>=2.5.1->pycountry_convert) (6.5.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\varda\anaconda3\lib\site-packages (from packaging->pytest>=3.4.0->pycountry_convert) (3.0.4) Requirement already satisfied: openpyxl in c:\users\varda\anaconda3\lib\site-packages (3.0.9) Requirement already satisfied: et-xmlfile in c:\users\varda\anaconda3\lib\site-packages (from openpyxl) (1.1.0)
# Importing DataSet "Prevalance by Mental Disorder and Substance" for answering the first two guiding quesitons
df = pd.read_excel("Mental health Depression disorder Data.xlsx","prevalence-by-mental-and-substa")
# Data Wrangaling : Renaming Columns
df.rename(columns = {'Entity':'Country',
'Schizophrenia (%)': 'Schizophrenia',
'Bipolar disorder (%)': 'Bipolar Disorder',
'Eating disorders (%)': 'Eating Disorder',
'Anxiety disorders (%)': 'Anxiety Disorder',
'Drug use disorders (%)': 'Drug use Disorder',
'Alcohol use disorders (%)': 'Alcohol Disorder',
'Depression (%)': 'Depression'}, inplace = True)
df.dtypes
Country object Code object Year float64 Schizophrenia float64 Bipolar Disorder float64 Eating Disorder float64 Anxiety Disorder float64 Drug use Disorder float64 Depression float64 Alcohol Disorder float64 dtype: object
#Data Wrangaling: Convert Year from float to Integer
df["Year"]= df['Year'].astype('int')
#Convert from Country Name to Continents
continent = []
for i in df['Country']:
try:
country_code = pc.country_name_to_country_alpha2(i, cn_name_format="default")
except:
country_code = 'Unknown'
if country_code != 'Unknown':
continent_name = pc.country_alpha2_to_continent_code(country_code)
continent.append(continent_name)
else:
continent.append('none')
#Add the continenets array as data frame
df['Continent'] = continent
display(df)
| Country | Code | Year | Schizophrenia | Bipolar Disorder | Eating Disorder | Anxiety Disorder | Drug use Disorder | Depression | Alcohol Disorder | Continent | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | 1990 | 0.160560 | 0.697779 | 0.101855 | 4.828830 | 1.677082 | 4.071831 | 0.672404 | AS |
| 1 | Afghanistan | AFG | 1991 | 0.160312 | 0.697961 | 0.099313 | 4.829740 | 1.684746 | 4.079531 | 0.671768 | AS |
| 2 | Afghanistan | AFG | 1992 | 0.160135 | 0.698107 | 0.096692 | 4.831108 | 1.694334 | 4.088358 | 0.670644 | AS |
| 3 | Afghanistan | AFG | 1993 | 0.160037 | 0.698257 | 0.094336 | 4.830864 | 1.705320 | 4.096190 | 0.669738 | AS |
| 4 | Afghanistan | AFG | 1994 | 0.160022 | 0.698469 | 0.092439 | 4.829423 | 1.716069 | 4.099582 | 0.669260 | AS |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6463 | Zimbabwe | ZWE | 2013 | 0.155670 | 0.607993 | 0.117248 | 3.090168 | 0.766280 | 3.128192 | 1.515641 | AF |
| 6464 | Zimbabwe | ZWE | 2014 | 0.155993 | 0.608610 | 0.118073 | 3.093964 | 0.768914 | 3.140290 | 1.515470 | AF |
| 6465 | Zimbabwe | ZWE | 2015 | 0.156465 | 0.609363 | 0.119470 | 3.098687 | 0.771802 | 3.155710 | 1.514751 | AF |
| 6466 | Zimbabwe | ZWE | 2016 | 0.157111 | 0.610234 | 0.121456 | 3.104294 | 0.772275 | 3.174134 | 1.513269 | AF |
| 6467 | Zimbabwe | ZWE | 2017 | 0.157963 | 0.611242 | 0.124443 | 3.110926 | 0.772648 | 3.192789 | 1.510943 | AF |
6468 rows × 11 columns
#Grouped by country and took mean of all disorders between 1990-2017
dfyear = df.groupby(["Year"],as_index=False)[['Schizophrenia','Bipolar Disorder','Eating Disorder','Anxiety Disorder','Depression','Alcohol Disorder','Drug use Disorder']].mean()
#display(dfyear)
fig = px.area(dfyear, x="Year", y=['Schizophrenia','Bipolar Disorder','Eating Disorder','Anxiety Disorder','Depression','Alcohol Disorder','Drug use Disorder'])
fig.update_layout(
title_text="Trend by Illness around the world",
title_x=0.5,
legend_title_text="Illness",
yaxis_title="Disease Percentage"
)
fig.show()
Analyzing this area graph we can infer the following:
From the percentage of people suffering from various mental Illness, Anxiety Disorder and Depression are the leading most from all Illness plotted on this graph. We can conclude this because the 'purple' and 'yellow' section take up most of the space of the area under our graph.
# Removed Rows Containing "none" values ;
# This removes the rows containing sub-regions listed under country columns so we can focus on individul countries
test = df[df["Continent"].str.contains("none") == False]
#Grouped by country and took mean of all disorders between 1990-2017
test = test.groupby(["Country","Continent","Year"],as_index=False)[['Schizophrenia','Bipolar Disorder','Eating Disorder','Anxiety Disorder','Depression','Alcohol Disorder','Drug use Disorder']].mean()
#display(test)
fig = px.scatter(test, x="Depression", y="Alcohol Disorder", color="Continent", trendline="ols",animation_frame = 'Year',
hover_name="Country", log_x=True)
fig.update_layout(
title_text="Relation between Alcohol Disorder & Depression",
title_x=0.5,
)
fig.show()
Analyzing this Scatter Plot we can infer the following:
On this graph we are comparing Alcohol Disorder and Depression across all the countries between 1990-2017 and see how they relate to one another. The graph is interactive and we can turn off-on the legends to see trends in one or multiple continents.
Looking at the following continents: Europe, Africa, North America, South America, and Oceanic: We can see a weak positive correlation between depression and alcohol disorders. As the country's depression level goes up the alcohol percentage is also slightly increasing.
For Asian countries: We can see a weak negative correlation between depression and alcohol disorders. As the country's depression level goes up the alcohol percentage is also slightly decreasing. This insight is might be misleading since it is very differs slightly from the rest of the continent, this might be due to missing data from countries in Asia or inconsistencies in data collection.
#Scatter Graph
fig = px.scatter(test, x="Anxiety Disorder", y="Drug use Disorder", color="Continent", trendline="ols", animation_frame = 'Year',
hover_name="Country", log_x=True)
fig.update_layout(
title_text="Drug use Vs. Anxiety Disorder",
title_x=0.5,
)
fig.show()
Analyzing this Scatter Plot we can infer the following:
On this graph we are comparing Druguse Disorder and Depression across all the countries (grouped by continents) between 1990-2017 and see how they relate to one another. The graph is interactive and we can turn off-on the legends to see trends in one or multiple continents.
Looking at trends for all the continents: When clicking the play button on the timeline, we can see a strong positive correlation between Drug use and Anxiety disorders. As the country's Drug use goes up , prevalance of Anxiety Disorder is also increasing.
# Bar Graph by Country
#calculating the mean percentages of each ilness by country
dfcountry = df.groupby(["Country","Continent","Code"],as_index=False)[['Schizophrenia','Bipolar Disorder','Eating Disorder','Anxiety Disorder','Depression']].mean()
country=dfcountry['Country'].to_numpy()
x=np.array(['Schizophrenia','Bipolar Disorder','Eating Disorder','Anxiety Disorder',
'Depression'])
fig = go.Figure()
#These will add buttons to the graph
buttons = [{"label": "Select Country: ", "method": "update", "args": [{'visible': [False for tm in country]}]}]
for ct in country:
countryData=df.loc[df['Country']==ct]
y=countryData[['Schizophrenia','Bipolar Disorder','Eating Disorder','Anxiety Disorder','Depression']].values
trace = go.Bar(x=x, y=y[0], name=ct, visible=False, hoverinfo="text",hovertext=y[0],
marker=dict(color = [12,24,36,48,60,72,84,96,108,120,132,144],
colorscale='viridis')
)
button = {"label": ct, 'method': 'update',
"args": [{'visible': [True if ct == ctx else False for ctx in country]}]}
fig.add_trace(trace)
buttons.append(button)
fig.update_layout(
{
"updatemenus":[
go.layout.Updatemenu(buttons=buttons, direction="down", pad={"r": 5, "t": 0}, showactive=True,
xanchor="right", x=1.3, yanchor="top", y=1.10)],
'title_text': 'Mental Illnesses Mean Percentages by Country (1990 - 2017)',
'xaxis': dict(title='Illness Type', tickangle=45),
'yaxis_title_text': 'Disease Percentage',
"width": 1000, "height": 700,
"autosize": True
}
)
fig.show()
This is an interactive bar chart :
This graph shows the rankings of each of the mental disorder. This graph also has a button from which we can select a specific country to look at their mean percentages of each Illness. We can hover over to a bar see those specific percentages for that Illness.
#Interactive button creation code
diseases = dfcountry.loc[:,'Schizophrenia':'Depression']
fig = go.Figure()
for column in diseases:
fig.add_trace(
go.Choropleth(locations = dfcountry['Code'],
z = dfcountry[column],
text = dfcountry['Country'],
reversescale = True,
marker_line_color='black',
marker_line_width=.5,
colorbar_tickprefix="%",
colorbar_title ='Percentage',
visible=False,
)
)
fig.update_layout(
updatemenus=[go.layout.Updatemenu(
active=0,
buttons=list(
[
dict(label = 'Select Illness',
method = 'update',
args = [{'visible': [False, False, False, False, False]}, # the index of True aligns with the indices of plot traces
{'title': 'Select Option From Dropdown',
'showlegend':True,
}],
),
dict(label = 'Schizophrenia',
method = 'update',
args = [{'visible': [True, False, False, False,False]}, # the index of True aligns with the indices of plot traces
{'title': 'Schizophrenia',
'showlegend':True}]),
dict(label = 'Bipolar Disorder',
method = 'update',
args = [{'visible': [False, True, False, False,False]},
{'title': 'Bipolar Disorder',
'showlegend':True}]),
dict(label = 'Eating Disorder',
method = 'update',
args = [{'visible': [False, False, True, False,False]},
{'title': 'Eating Disorder',
'showlegend':True}]),
dict(label = 'Anxiety Disorders',
method = 'update',
args = [{'visible': [False, False, False, True,False]},
{'title': 'Anxiety Disorders',
'showlegend':True}]),
dict(label = 'Depression',
method = 'update',
args = [{'visible': [False, False, False, False,True]},
{'title': 'Depression',
'showlegend':True}])
])
)
])
fig.show()
Analyzing this Choropleth graph above we can infer the following:
From this geographic graph we can clearly see the percentages of Illness each country is suffering from. Looking at the color scale of different illness around the world gives a good visulization of the intensites. Also, we can interact with the drop down widget to select a specific illness and look at their percentages and intensities around the world.
Depression and Anxiety Disorder are the leading most Illness since their scale range is much higher that others
#Tree Map Graph 1
country_continents = dfcountry[dfcountry["Continent"].str.contains("none") == False]
fig = px.treemap(country_continents, path=['Continent','Country'], values='Depression',
color='Depression',
title="Depression Percentages Across Continents")
fig.show()
C:\Users\varda\anaconda3\lib\site-packages\plotly\express\_core.py:1637: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. C:\Users\varda\anaconda3\lib\site-packages\plotly\express\_core.py:1637: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
Analyzing this TreeMap above we can infer the following:
For this tree map our area of interest was Depresssion. And from this tree map we can clearly see that the Top 3 Continents that rank the Highesht in Eating Disorder are
Additionally, we can click on the Continent to view the countries in it. By hovering over each Continent we can see the sum of percentages of the specific illness. And we can check the intensities by looking at the color scale
#Tree Map Graph 2
country_continents = dfcountry[dfcountry["Continent"].str.contains("none") == False]
fig = px.treemap(country_continents, path=['Continent','Country'], values='Anxiety Disorder',
color='Anxiety Disorder',
title="Anxiety Across Continents")
fig.show()
C:\Users\varda\anaconda3\lib\site-packages\plotly\express\_core.py:1637: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. C:\Users\varda\anaconda3\lib\site-packages\plotly\express\_core.py:1637: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
Analyzing this TreeMap above we can infer the following:
For this tree map our area of interest was Anxiety Disorder. And from this tree map we can clearly see that the Top 3 Continents that rank the Highesht in Anxiety Disorder are
Additionally, we can click on the Continent to view the countries in it. By hovering over each Continent we can see the sum of percentages of the specific illness. And we can check the intensities by looking at the color scale
#Allows modification for copy of dataframes
pd.set_option('mode.chained_assignment', None)
# Importing DataSet "depression by education level" for answering the first two guiding quesitons
df=pd.read_excel("Mental health Depression disorder Data.xlsx","depression-by-level-of-educatio")
df.head(5)
| Entity | Code | Year | All levels (active) (%) | All levels (employed) (%) | All levels (total) (%) | Below upper secondary (active) (%) | Below upper secondary (employed) (%) | Below upper secondary (total) (%) | Tertiary (active) (%) | Tertiary (employed) (%) | Tertiary (total) (%) | Upper secondary & post-secondary non-tertiary (active) (%) | Upper secondary & post-secondary non-tertiary (employed) (%) | Upper secondary & post-secondary non-tertiary (total) (%) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Austria | AUT | 2014.0 | 6.5 | 4.7 | 7.7 | 15.5 | 9.0 | 15.2 | 4.3 | 3.5 | 5.5 | 5.5 | 4.2 | 6.7 |
| 1 | Belgium | BEL | 2014.0 | 5.0 | 4.1 | 7.1 | 7.1 | 4.8 | 11.6 | 3.7 | 3.3 | 4.2 | 5.7 | 5.0 | 7.5 |
| 2 | Czech Republic | CZE | 2014.0 | 3.0 | 2.6 | 4.0 | 2.1 | 2.5 | 6.0 | 1.7 | 1.7 | 2.0 | 3.5 | 3.0 | 4.4 |
| 3 | Denmark | DNK | 2014.0 | 6.7 | 5.7 | 8.3 | 10.4 | 6.5 | 15.5 | 5.7 | 4.7 | 6.7 | 7.4 | 6.9 | 8.8 |
| 4 | Estonia | EST | 2014.0 | 3.8 | 3.8 | 5.1 | 4.7 | 4.7 | 6.4 | 3.6 | 3.6 | 4.3 | 3.7 | 3.8 | 5.2 |
df.tail(5)
| Entity | Code | Year | All levels (active) (%) | All levels (employed) (%) | All levels (total) (%) | Below upper secondary (active) (%) | Below upper secondary (employed) (%) | Below upper secondary (total) (%) | Tertiary (active) (%) | Tertiary (employed) (%) | Tertiary (total) (%) | Upper secondary & post-secondary non-tertiary (active) (%) | Upper secondary & post-secondary non-tertiary (employed) (%) | Upper secondary & post-secondary non-tertiary (total) (%) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21 | Slovenia | SVN | 2014.0 | 7.6 | 6.0 | 8.4 | 12.3 | 10.2 | 12.6 | 6.2 | 5.8 | 6.6 | 7.1 | 5.4 | 7.7 |
| 22 | Spain | ESP | 2014.0 | 5.5 | 4.1 | 7.1 | 7.5 | 5.2 | 9.7 | 3.3 | 2.7 | 3.5 | 5.8 | 5.1 | 7.1 |
| 23 | Sweden | SWE | 2014.0 | 8.4 | 8.0 | 9.9 | 8.8 | 8.2 | 11.4 | 7.9 | 7.8 | 9.0 | 9.2 | 8.7 | 10.6 |
| 24 | Turkey | TUR | 2014.0 | 10.2 | 9.6 | 12.5 | 10.5 | 10.0 | 13.3 | 9.1 | 8.2 | 9.5 | 12.9 | 12.5 | 12.5 |
| 25 | United Kingdom | GBR | 2014.0 | 7.4 | 6.3 | 9.9 | 11.0 | 8.1 | 15.2 | 5.7 | 5.3 | 7.1 | 8.3 | 7.1 | 10.7 |
df.dtypes
Entity object Code object Year float64 All levels (active) (%) float64 All levels (employed) (%) float64 All levels (total) (%) float64 Below upper secondary (active) (%) float64 Below upper secondary (employed) (%) float64 Below upper secondary (total) (%) float64 Tertiary (active) (%) float64 Tertiary (employed) (%) float64 Tertiary (total) (%) float64 Upper secondary & post-secondary non-tertiary (active) (%) float64 Upper secondary & post-secondary non-tertiary (employed) (%) float64 Upper secondary & post-secondary non-tertiary (total) (%) float64 dtype: object
#Renaming the column names to be more redable and easy to work with as a variable
df.rename(
columns=({ 'Entity': 'Country', 'All levels (active) (%)': 'All Active Job Searchers','All levels (employed) (%)': 'All Employed',
'All levels (total) (%)':'All Levels'}),
inplace=True,
)
df=df[['Country','All Active Job Searchers','All Employed','All Levels','Below upper secondary (active) (%)',
'Below upper secondary (employed) (%)','Below upper secondary (total) (%)','Tertiary (active) (%)','Tertiary (employed) (%)',
'Tertiary (total) (%)','Upper secondary & post-secondary non-tertiary (active) (%)','Upper secondary & post-secondary non-tertiary (employed) (%)',
'Upper secondary & post-secondary non-tertiary (total) (%)'
]]
df.head(5)
| Country | All Active Job Searchers | All Employed | All Levels | Below upper secondary (active) (%) | Below upper secondary (employed) (%) | Below upper secondary (total) (%) | Tertiary (active) (%) | Tertiary (employed) (%) | Tertiary (total) (%) | Upper secondary & post-secondary non-tertiary (active) (%) | Upper secondary & post-secondary non-tertiary (employed) (%) | Upper secondary & post-secondary non-tertiary (total) (%) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Austria | 6.5 | 4.7 | 7.7 | 15.5 | 9.0 | 15.2 | 4.3 | 3.5 | 5.5 | 5.5 | 4.2 | 6.7 |
| 1 | Belgium | 5.0 | 4.1 | 7.1 | 7.1 | 4.8 | 11.6 | 3.7 | 3.3 | 4.2 | 5.7 | 5.0 | 7.5 |
| 2 | Czech Republic | 3.0 | 2.6 | 4.0 | 2.1 | 2.5 | 6.0 | 1.7 | 1.7 | 2.0 | 3.5 | 3.0 | 4.4 |
| 3 | Denmark | 6.7 | 5.7 | 8.3 | 10.4 | 6.5 | 15.5 | 5.7 | 4.7 | 6.7 | 7.4 | 6.9 | 8.8 |
| 4 | Estonia | 3.8 | 3.8 | 5.1 | 4.7 | 4.7 | 6.4 | 3.6 | 3.6 | 4.3 | 3.7 | 3.8 | 5.2 |
#Sorting Graph based on Depression Percentage
sorted_df = df.sort_values(by='All Levels', ascending=False)
#Bar Graph
fig = px.bar(sorted_df,x='Country', y='All Levels', color='All Levels', title="Depression Percentages by Country",
labels={'All Levels':'Overall Depression Percentage'})
fig.show()
Analyzing this Bar Graph above we can infer the following:
Since this dataset focuses on data only from the year 2014 and the European sub region, we have decided to use a bar chart to show the percentages of depression ( Across All Educational Levels) in each country from highest to lowest.
According to Results of the Gallup-Healthways Global Well-Being Index of 2014, Austria Ranked number 4 in being the happiest country that year. Happiness was measured using 5 key elements: purpose well-being, financial well-being, social well-being, community well-being and physical well-being. We can see in our bar chart that from all the European countries, Austria falls towards the lower end of depression levels which suggest overall postivie mental health in the country of Austria
#Chloropleth Graph
geo=px.choropleth(df,locations='Country',locationmode='country names',
color='All Levels',hover_name='Country',projection='natural earth',
title='Geo Map for Depression Data')
geo.show()
Above is a Geographic vizualiztion of all the european countries included in the dataset and their respective Levels of Depression is shown in percentages in the color scale.
#Interactive button creation code
country=df['Country'].to_numpy()
x=np.array(['Below upper secondary (active) (%)', 'Upper secondary & post-secondary non-tertiary (active) (%)',
'Tertiary (active) (%)', 'Below upper secondary (employed) (%)',
'Upper secondary & post-secondary non-tertiary (employed) (%)', 'Tertiary (employed) (%)',
'Below upper secondary (total) (%)', 'Upper secondary & post-secondary non-tertiary (total) (%)',
'Tertiary (total) (%)'])
fig = go.Figure()
buttons = [{"label": "Select Country: ", "method": "update", "args": [{'visible': [False for tm in country]}]}]
for ct in country:
countryData=df.loc[df['Country']==ct]
y=countryData[['Below upper secondary (active) (%)', 'Upper secondary & post-secondary non-tertiary (active) (%)',
'Tertiary (active) (%)', 'Below upper secondary (employed) (%)',
'Upper secondary & post-secondary non-tertiary (employed) (%)', 'Tertiary (employed) (%)',
'Below upper secondary (total) (%)', 'Upper secondary & post-secondary non-tertiary (total) (%)',
'Tertiary (total) (%)']].values
trace = go.Bar(x=x, y=y[0], name=ct, visible=False, hoverinfo="text",hovertext=y[0],
marker=dict(color = [12,24,36,48,60,72,84,96,108,120,132,144],
colorscale='viridis'))
button = {"label": ct, 'method': 'update',
"args": [{'visible': [True if ct == ctx else False for ctx in country]}]}
fig.add_trace(trace)
buttons.append(button)
fig.update_layout(
{
"updatemenus":[
go.layout.Updatemenu(buttons=buttons, direction="down", pad={"r": 5, "t": 0}, showactive=True,
xanchor="right", x=1.3, yanchor="top", y=1.10)],
'title_text': 'Depression Rates for Education Groups by Country',
'xaxis': dict(title='Groups Segregated by Education levels', tickangle=45),
'yaxis_title_text': 'Percentage of People with Depression',
"width": 1000, "height": 700,
"autosize": True
}
)
fig.show()
Analyzing this Bar Graph above we can infer the following:
This bargraph analyzes the levels of depression across different educational levels. One can use the drop down to change the country and look at the percentage of depression among : Job Seekers, employed, upper-secondary and post secondary.
After analyzing every single country we can infer that people who fall in the bracket of 'Below upper Secondary' , which respresent those who have not finished high school education have the highest levels of depression. Additionally, we can also see that the population of people employed are less likely to be depressed compared to their counterparts in active job seekers.
# Importing dataset from TechSurvery csv file for answering the fourth guiding quesitonsn which fouces on data collected
# from a survey of people in the Tech field
tech=pd.read_csv("MENTAL HEALTH DATASET.csv")
tech.head(5)
| Timestamp | Age | Gender | Country | state | self_employed | family_history | treatment | work_interfere | no_employees | ... | leave | mental_health_consequence | phys_health_consequence | coworkers | supervisor | mental_health_interview | phys_health_interview | mental_vs_physical | obs_consequence | comments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2014-08-27 11:29:31 | 37 | Female | United States | IL | NaN | No | Yes | Often | 6-25 | ... | Somewhat easy | No | No | Some of them | Yes | No | Maybe | Yes | No | NaN |
| 1 | 2014-08-27 11:29:37 | 44 | M | United States | IN | NaN | No | No | Rarely | More than 1000 | ... | Don't know | Maybe | No | No | No | No | No | Don't know | No | NaN |
| 2 | 2014-08-27 11:29:44 | 32 | Male | Canada | NaN | NaN | No | No | Rarely | 6-25 | ... | Somewhat difficult | No | No | Yes | Yes | Yes | Yes | No | No | NaN |
| 3 | 2014-08-27 11:29:46 | 31 | Male | United Kingdom | NaN | NaN | Yes | Yes | Often | 26-100 | ... | Somewhat difficult | Yes | Yes | Some of them | No | Maybe | Maybe | No | Yes | NaN |
| 4 | 2014-08-27 11:30:22 | 31 | Male | United States | TX | NaN | No | No | Never | 100-500 | ... | Don't know | No | No | Some of them | Yes | Yes | Yes | Don't know | No | NaN |
5 rows × 27 columns
tech.tail(5)
| Timestamp | Age | Gender | Country | state | self_employed | family_history | treatment | work_interfere | no_employees | ... | leave | mental_health_consequence | phys_health_consequence | coworkers | supervisor | mental_health_interview | phys_health_interview | mental_vs_physical | obs_consequence | comments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1254 | 2015-09-12 11:17:21 | 26 | male | United Kingdom | NaN | No | No | Yes | NaN | 26-100 | ... | Somewhat easy | No | No | Some of them | Some of them | No | No | Don't know | No | NaN |
| 1255 | 2015-09-26 01:07:35 | 32 | Male | United States | IL | No | Yes | Yes | Often | 26-100 | ... | Somewhat difficult | No | No | Some of them | Yes | No | No | Yes | No | NaN |
| 1256 | 2015-11-07 12:36:58 | 34 | male | United States | CA | No | Yes | Yes | Sometimes | More than 1000 | ... | Somewhat difficult | Yes | Yes | No | No | No | No | No | No | NaN |
| 1257 | 2015-11-30 21:25:06 | 46 | f | United States | NC | No | No | No | NaN | 100-500 | ... | Don't know | Yes | No | No | No | No | No | No | No | NaN |
| 1258 | 2016-02-01 23:04:31 | 25 | Male | United States | IL | No | Yes | Yes | Sometimes | 26-100 | ... | Don't know | Maybe | No | Some of them | No | No | No | Don't know | No | NaN |
5 rows × 27 columns
#Dropping cloumns not required for our analysis
tech = tech.drop(['Timestamp', 'obs_consequence','coworkers','mental_health_interview','state','phys_health_interview',
'Country','mental_vs_physical','anonymity','work_interfere','phys_health_consequence'], axis=1)
tech.head(1)
| Age | Gender | self_employed | family_history | treatment | no_employees | remote_work | tech_company | benefits | care_options | wellness_program | seek_help | leave | mental_health_consequence | supervisor | comments | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 37 | Female | NaN | No | Yes | 6-25 | No | Yes | Yes | Not sure | No | Yes | Somewhat easy | No | Yes | NaN |
#Renaming columns
tech.rename(
columns=({ 'self_employed': 'Self Employed', 'family_history': 'Family History','treatment': 'Treatment',
'no_employees':'Number of Employees','remote_work':'Remote Available','tech_company':'Tech Company',
'benefits':'Benefits','care_options':'Care Options','wellness_program':'Wellness Program','seek_help':'Seek Help',
'leave':'Leave','mental_health_consequence':'Mental Health Consequences','supervisor':'Supervisor',
'comments':'Comments'}),
inplace=True,)
#Modifying rows to have 1/0 outcomes instead of yes/no
tech.loc[(tech['Mental Health Consequences'] == 'Maybe'), 'Mental Health Consequences']='Yes'
tech.loc[(tech['Mental Health Consequences'] == 'Yes'), 'Mental Health Consequences']=1
tech.loc[(tech['Mental Health Consequences'] == 'No'), 'Mental Health Consequences']=0
# Data Wrangaling on the Gender Column to focus on two genders Male/Female and categorizing rest in to "other" catgory.
print("Before cleaning Gender column")
print(tech["Gender"].unique())
tech.loc[(tech['Gender'] == 'M') | (tech['Gender'] == 'Male') | (tech['Gender'] == 'm') | (tech['Gender'] == 'male')
| (tech['Gender'] == 'maile') | (tech['Gender'] == 'something kinda male?') | (tech['Gender'] == 'Cis Male')
|(tech['Gender'] == 'Mal') | (tech['Gender'] == 'Male (CIS)') | (tech['Gender'] == 'Make') | (tech['Gender'] == 'Guy (-ish) ^_^')
| (tech['Gender'] == 'Male ') | (tech['Gender'] == 'Man') | (tech['Gender'] == 'msle') | (tech['Gender'] == 'Mail')
| (tech['Gender'] == 'cis male') | (tech['Gender'] == 'Malr') | (tech['Gender'] == 'Cis Man')
| (tech['Gender'] == 'ostensibly male') | (tech['Gender'] == 'Male-ish'), 'Gender']='Male'
tech.loc[(tech['Gender'] == 'F') | (tech['Gender'] == 'Female') | (tech['Gender'] == 'f') | (tech['Gender'] == 'female')
| (tech['Gender'] == 'Cis Female') | (tech['Gender'] == 'Woman') | (tech['Gender'] == 'Femake')
| (tech['Gender'] == 'woman') | (tech['Gender'] == 'Female ') | (tech['Gender'] == 'cis-female/femme')
| (tech['Gender'] == 'Female (cis)') | (tech['Gender'] == 'femail') | (tech['Gender'] == 'female')
, 'Gender']='Female'
tech.loc[(tech['Gender'] == 'Trans-female') | (tech['Gender'] == 'queer/she/they') | (tech['Gender'] == 'non-binary') | (tech['Gender'] == 'Nah')
| (tech['Gender'] == 'All') | (tech['Gender'] == 'Enby') | (tech['Gender'] == 'fluid')
| (tech['Gender'] == 'Genderqueer') | (tech['Gender'] == 'Androgyne ') | (tech['Gender'] == 'Agender')
| (tech['Gender'] == 'male leaning androgynous') | (tech['Gender'] == 'Trans woman') | (tech['Gender'] == 'Neuter')
| (tech['Gender'] == 'Female (trans)') | (tech['Gender'] == 'queer') | (tech['Gender'] == 'A little about you')
| (tech['Gender'] == 'p') | (tech['Gender'] == 'ostensibly male') | (tech['Gender'] == 'unsure what that really means')
| (tech['Gender'] == 'Androgyne') | (tech['Gender'] == 'ostensibly male, unsure what that really means')
, 'Gender']='Other'
print("\nAfter cleaning Gender column")
print(tech["Gender"].unique())
Before cleaning Gender column ['Female' 'M' 'Male' 'male' 'female' 'm' 'Male-ish' 'maile' 'Trans-female' 'Cis Female' 'F' 'something kinda male?' 'Cis Male' 'Woman' 'f' 'Mal' 'Male (CIS)' 'queer/she/they' 'non-binary' 'Femake' 'woman' 'Make' 'Nah' 'All' 'Enby' 'fluid' 'Genderqueer' 'Female ' 'Androgyne' 'Agender' 'cis-female/femme' 'Guy (-ish) ^_^' 'male leaning androgynous' 'Male ' 'Man' 'Trans woman' 'msle' 'Neuter' 'Female (trans)' 'queer' 'Female (cis)' 'Mail' 'cis male' 'A little about you' 'Malr' 'p' 'femail' 'Cis Man' 'ostensibly male, unsure what that really means'] After cleaning Gender column ['Female' 'Male' 'Other']
# Dropping NA and ambigous responses
tech=tech[tech["Wellness Program"]!="Don't know"]
tech=tech[tech["Benefits"]!="Don't know"]
tech=tech[tech["Self Employed"]!="NaN"]
tech=tech[tech["Seek Help"]!="Don't know"]
tech=tech[tech["Care Options"]!="Not sure"]
tech=tech[tech["Leave"]!="Don't know"]
#Modifying Leave column to feature Easy/Difficult data only
tech.loc[(tech['Leave'] == 'Somewhat difficult'), 'Leave'] = "Difficult"
tech.loc[(tech['Leave'] == 'Very difficult'), 'Leave'] = "Difficult"
tech.loc[(tech['Leave'] == 'Somewhat easy'), 'Leave'] = "Easy"
tech.loc[(tech['Leave'] == 'Very easy'), 'Leave'] = "Easy"
#Adding updated columns to new Dataframe
tech1=tech[[ 'Gender', 'Self Employed', 'Family History', 'Treatment', 'Remote Available'
,'Benefits', 'Care Options', 'Wellness Program', 'Seek Help', 'Leave','Supervisor',
'Mental Health Consequences'
]]
tech1.head(5)
| Gender | Self Employed | Family History | Treatment | Remote Available | Benefits | Care Options | Wellness Program | Seek Help | Leave | Supervisor | Mental Health Consequences | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | Male | NaN | No | No | No | No | No | No | No | Difficult | Yes | 0 |
| 3 | Male | NaN | Yes | Yes | No | No | Yes | No | No | Difficult | No | 1 |
| 6 | Female | NaN | Yes | Yes | Yes | No | No | No | No | Difficult | No | 1 |
| 8 | Female | NaN | Yes | Yes | No | Yes | Yes | No | No | Difficult | Yes | 1 |
| 12 | Female | NaN | Yes | Yes | No | Yes | Yes | No | No | Difficult | Yes | 1 |
#Segregating Dataframe based on personal or work related environment
general=tech1[[ 'Gender', 'Family History', 'Treatment', 'Seek Help', 'Mental Health Consequences']]
work=tech1[[ 'Gender', 'Wellness Program', 'Leave', 'Supervisor','Mental Health Consequences']]
work=work[work["Supervisor"]!= "Some of them" ]
#Parallel Categories Graph 1
fig = px.parallel_categories(general , dimensions=['Gender', 'Family History', 'Treatment','Seek Help','Mental Health Consequences'],
color="Mental Health Consequences",
labels={'Gender':'Gender', 'Family History':'Family History', 'Treatment':'Treatment','Seek Help':'Seek Help',
'Mental Health Consequences':'Mental Health Consequences'},
title="General Factors for Workplace Survey")
fig.show()
Analyzing this Parallel Category Graph above we can infer the following:
Since this dataset focuses on categorical data such as Yes, No, Don't Know, we thought it was best to visualize it on a parallel graph.
We can make inferences when we hover over any one of the section. When we hover toward the bottom we can see for example that there were in total 40 Males who responded
As a result they still thought that discussing a mental health issue with the employer would have negative consequences
When we hover toward the top we can see for example that there were in total 37 Males who responded
As a result they thought that discussing a mental health issue with the employer would not have a negative consequences
#Parallel Categories Graph 2
fig = px.parallel_categories(work,dimensions=['Gender', 'Wellness Program', 'Leave', 'Supervisor','Mental Health Consequences'],
color="Mental Health Consequences",
labels={'Gender':'Gender', 'Wellness Program':'Wellness Program', 'Leave':'Leave','Supervisor':'Supervisor',
'Mental Health Consequences':'Mental Health Consequences'},
title="Work Related Factors for Workplace Survey")
fig.show()
When we hover toward the top we can see for example that there were in total 26 Males who responded
As a result they thought that discussing a mental health issue with the employer would have a negative consequences
From the two parallel categories graph, we have visualized the various factors that may be responsible for the deteriating mental health of any individual in both the personal and the professional setting. From the graphs we can see that the work related factors affect the mental health much more than the personal factors. Work Environment factors like ease of getting a leave or having a supervisor in the workpace lead to much lesser mental health concerns. On the other other hand irrespective of the prior family history or previous treatments seeked, the prevelance of mental health issues remain consistent through both the possibilities.
#Reading required sheets from the excel file of the dataset
data_by_age = pd.read_excel("Mental health Depression disorder Data.xlsx","prevalence-of-depression-by-age")
data_by_gender = pd.read_excel("Mental health Depression disorder Data.xlsx","prevalence-of-depression-males-")
#reading the Country-Continent mapping csv file
country_mapping = pd.read_csv('countryContinent.csv', encoding = "ISO-8859-1")
#selecting columns from csv for joining onto our dataset
country_mapping_v2 = country_mapping[["country", "continent","sub_region"]]
#renaming columns for both the datasets
data_by_age.rename(
columns=({ 'Entity': 'Country','Code': 'Country_Code','20-24 years old (%)' : '20-24 years','10-14 years old (%)' : '10-14 years','All ages (%)' : 'All age groups','70+ years old (%)' : '70+ years','30-34 years old (%)' : '30-34 years','15-19 years old (%)' : '15-19 years','25-29 years old (%)' : '25-29 years','50-69 years old (%)' : '50-69 years','Age-standardized (%)' : 'Age_Standardized','15-49 years old (%)' : '15-49 years'}),
inplace=True,
)
data_by_gender.rename(
columns=({ 'Entity': 'Country','Code': 'Country_Code','Prevalence in males (%)': 'Males','Prevalence in females (%)': 'Females','Population': 'Total_Population'}),
inplace=True,
)
#renaming columns for the mapping file
country_mapping_v2.rename(
columns = ({'country':'Country','continent':'Continent','sub_region':'Sub_region'}),
inplace = True,
)
#Performing a left join on the dataset ON COUNTRY to import its respective Continent and Sub-regions
data_by_gender = data_by_gender.merge(country_mapping_v2, on='Country', how='left')
#Pivoting up the dataset for a more usable/structured format
new_data = pd.melt(data_by_age, id_vars =['Country','Country_Code','Year'], value_vars =['20-24 years', '10-14 years',
'All age groups','70+ years',
'30-34 years','15-19 years',
'25-29 years','50-69 years',
'Age_Standardized','15-49 years'])
data_by_gender_v2 = pd.melt(data_by_gender, id_vars =['Continent','Sub_region','Country','Country_Code','Year','Total_Population'], value_vars =['Males','Females'])
#Renaming values for clearer understanding
data_by_gender_v2.rename(
columns = ({'variable':'Gender','value':'Depression_percentage'}),
inplace = True,
)
#Dropping all the NAN values to accuratley visualize the results
data_by_gender_v2 = data_by_gender_v2.dropna()
#Creating a new value for 100 times the value for increased variation in graphs
new_data['value_x100'] = new_data['value']*100
new_data.rename(
columns = ({'variable':'Age_Group','value':'Percent_Value','value_x100':'Percent_Value_x100'}),
inplace = True,
)
#Joining the database with the Country-Continent mapping
new_data = new_data.merge(country_mapping_v2, on='Country', how='left')
#data_by_gender_v2 = data_by_gender_v2.merge(country_mapping_v2, on='Country', how='left')
display(new_data)
| Country | Country_Code | Year | Age_Group | Percent_Value | Percent_Value_x100 | Continent | Sub_region | |
|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | 1990.0 | 20-24 years | 4.417802 | 441.780176 | Asia | Southern Asia |
| 1 | Afghanistan | AFG | 1991.0 | 20-24 years | 4.433524 | 443.352425 | Asia | Southern Asia |
| 2 | Afghanistan | AFG | 1992.0 | 20-24 years | 4.453689 | 445.368919 | Asia | Southern Asia |
| 3 | Afghanistan | AFG | 1993.0 | 20-24 years | 4.464517 | 446.451666 | Asia | Southern Asia |
| 4 | Afghanistan | AFG | 1994.0 | 20-24 years | 4.462960 | 446.295963 | Asia | Southern Asia |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 64675 | Zimbabwe | ZWE | 2013.0 | 15-49 years | 3.133858 | 313.385759 | Africa | Eastern Africa |
| 64676 | Zimbabwe | ZWE | 2014.0 | 15-49 years | 3.153508 | 315.350781 | Africa | Eastern Africa |
| 64677 | Zimbabwe | ZWE | 2015.0 | 15-49 years | 3.179233 | 317.923295 | Africa | Eastern Africa |
| 64678 | Zimbabwe | ZWE | 2016.0 | 15-49 years | 3.206184 | 320.618417 | Africa | Eastern Africa |
| 64679 | Zimbabwe | ZWE | 2017.0 | 15-49 years | 3.233777 | 323.377712 | Africa | Eastern Africa |
64680 rows × 8 columns
data_by_gender_v2.head(5)
| Continent | Sub_region | Country | Country_Code | Year | Total_Population | Gender | Depression_percentage | |
|---|---|---|---|---|---|---|---|---|
| 190 | Asia | Southern Asia | Afghanistan | AFG | 1990.0 | 12412000.0 | Males | 3.499982 |
| 191 | Asia | Southern Asia | Afghanistan | AFG | 1991.0 | 13299000.0 | Males | 3.503947 |
| 192 | Asia | Southern Asia | Afghanistan | AFG | 1992.0 | 14486000.0 | Males | 3.508912 |
| 193 | Asia | Southern Asia | Afghanistan | AFG | 1993.0 | 15817000.0 | Males | 3.513429 |
| 194 | Asia | Southern Asia | Afghanistan | AFG | 1994.0 | 17076000.0 | Males | 3.515578 |
Analyzing Plots
#Line Graph for different age groups
new_data_filtered_plot5 = new_data.groupby(["Year","Age_Group"],as_index=False)[['Percent_Value']].mean()
new_data_filtered_plot5 = new_data_filtered_plot5[new_data_filtered_plot5.Age_Group != "All age groups"]
new_data_filtered_plot5 = new_data_filtered_plot5[new_data_filtered_plot5.Age_Group != "Age_Standardized"]
new_data_filtered_plot5 = new_data_filtered_plot5[new_data_filtered_plot5.Age_Group != "15-19 years"]
new_data_filtered_plot5 = new_data_filtered_plot5[new_data_filtered_plot5.Age_Group != "20-24 years"]
new_data_filtered_plot5 = new_data_filtered_plot5[new_data_filtered_plot5.Age_Group != "25-29 years"]
new_data_filtered_plot5 = new_data_filtered_plot5[new_data_filtered_plot5.Age_Group != "30-34 years"]
plot5 = px.line(new_data_filtered_plot5, x="Year", y="Percent_Value", color="Age_Group",line_shape="spline", render_mode="svg",
labels={"Percent_Value": "Depression Percentage","Age_Group": "Age Groups"},
title="Depression Percentage Based on Age Groups")
plot5.show()
After Plotting a line graph for all the different age groups we concluded that from year 1990 - 2017 there has not been a significant change in the percentages of Depression in each of the age groups. Although we can observe similar results to that of the preivous bar chart that overall there is a increase in depression percentage as one's age gets older
#Bar graph to visualize mean depression percentage
new_data_filtered = new_data[new_data.Age_Group != "All age groups"]
new_data_filtered = new_data_filtered[new_data_filtered.Age_Group != "Age_Standardized"]
new_data_filtered = new_data_filtered[new_data_filtered.Age_Group != "15-19 years"]
new_data_filtered = new_data_filtered[new_data_filtered.Age_Group != "20-24 years"]
new_data_filtered = new_data_filtered[new_data_filtered.Age_Group != "25-29 years"]
new_data_filtered = new_data_filtered[new_data_filtered.Age_Group != "30-34 years"]
data_plot2 = new_data_filtered.groupby(["Continent","Age_Group"],as_index=False)[['Percent_Value']].mean()
plot2 = px.bar(data_plot2, x="Percent_Value", y="Continent", color="Age_Group",orientation = 'h',
labels={"Percent_Value": "Depression Percentage","Age_Group": "Age Groups"},
title="Depression Percentages Across Continents")
plot2.show()
From the Horizontal Bar graph plotted above we can see various 'Mean Depression Percentage' across our x-axis and 'Continents' on y-axis from year 1990-2017 . Our bargraph is stacked by different age groups and when we hover over one of its section we can see the respective percentage of Depression across different age groups.
Overall there is a increase in depression percentage as one's age gets older For example after hovering over each of the age groups in the continents we can conclude that the following:
# Grouping the dataset by Continent and Gender to view the mean percentages of Depression
data_plot1 = data_by_gender_v2.groupby(["Continent","Gender"],as_index=False)[['Depression_percentage']].mean()
plot1 = px.bar(data_plot1, x="Continent", y="Depression_percentage", color="Gender",
labels={"Depression_percentage": "Depression Percentage"},
title="Males vs Females Depression Rates Across Continents")
plot1.show()
From the graph plotted above we can see various 'Continets' across our x-axis and 'Depression Percentages' on y-axis. Our bargraph also stacked by Gender and when we hover over one of its section we can see the respective percentage of Depression in Males and Females.
When we hover over each of the bargraph, we can see the coresponding depression percentage of both males(Red) and females(purple). We can conclude that in each of the contients the percentage of Depression is much more 'HIGHER' in Females than in Males.
#Scatter graph to visualize male vs females per continent
plot4 = px.scatter(data_by_gender.dropna(), x="Males", y="Females", animation_frame="Year", animation_group="Country",
size="Total_Population", color="Continent", hover_name="Country", facet_col="Continent",
log_x=True, size_max=45, title="Scatter Plot across Continents from 1990-2017")
plot4.show()
Analyzing this Scatter Plot we can infer the following:
On this graph we are comparing Males(x-axis) and Females(y-axis) across all the countries (grouped by continents) between 1990-2017 and see how they relate to one another. The graph is interactive and we can look at the trends throught the year by clicking the play button on the timeline.
Looking at trends for across all continents: We can observe that as the number of percentage of males depressed goes up so does the depression in female goes up. But, over the years (after animating through the timeline) there is no movement of the bubbles on the graph which shows no significant change in percentage throughtout the years.
#Line graph to visualize Depression pecentages by Gender
data_by_gender_v2_plot6 = data_by_gender_v2.groupby(["Year","Gender"],as_index=False)[['Depression_percentage']].mean()
plot6 = px.line(data_by_gender_v2_plot6, x="Year", y="Depression_percentage", color="Gender",line_shape="spline",
render_mode="svg",title="Depression Percentages for Males and Female",
labels={"Depression_percentage": "Depression Percentage"})
plot6.show()
After Plotting a line graph for Males and Females from year 1990 - 2017, we observe there has not been a significant change in the percentages of Depression in the Gender group. We can conclude that across all the contients the percentage of Depression is much more 'HIGHER' in Females (approx. 4.2) than in Males (approx 2.8).
#Sunburst Graph
new_data_v2 = new_data.dropna()
plot3 = px.sunburst(new_data_v2, path=['Continent', 'Country'], values='Percent_Value_x100',
color='Percent_Value', hover_data=['Country_Code'],
title="Depression Percentage by Continent-Country",
labels={"Percent_Value": "Depression Percentage"})
plot3.show()
C:\Users\varda\anaconda3\lib\site-packages\plotly\express\_core.py:1637: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. C:\Users\varda\anaconda3\lib\site-packages\plotly\express\_core.py:1637: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
Our team decided to use the above SunBurst Graph to compare our previous tree map results and look at the rankings of Continents by mean percentages of Depression and our results are the following:
We can also interact with the graph and by clicking on anyone of the continents, which will show us different countries in that continet. Aditionally, we can hover over any country or continent to view its corresponding Mean Depression percentages. Provided on the right is a colorscale (Dark to Light) that we can use to analyze how intense are the Depression percentage in those countries/continents
After comparing our results from the Sunburst Map we can confirm that our rankings and intensities are similar to that of our previous Tree Map.
How does each country rank with its population suffering from a specific mental illness?
With plethora of visulizations inclduing Tree Maps, Bar charts and Geographical Maps we concluded that the Top 3 Continets with highest Mean Depression Percentage between 1990-2020 are Africa, Asia and Europe repectively. As a team we also added a interactive way to look at each country with its ranking of each of the mental disorder. In addition a cartograohic map was created to allow one to visualize mean percentages of various Illness across the world between in 1990-2017 with the help of a colormap.
What kind of relationship patterns do we see between the disorders?
What is the general trend of education level based on the three main educational levels?
General insights into how depression percentages compared between people actively looking for jobs vs job searchers?
How do the depression levels change over the years across both, gender and age groups?
Analyze/derive patterns if any across various continents by gender or age groups.
Elyse, P.by: (2020) 10 happiest countries in the world 2014, Miratel Solutions Inc. Gallup-Healthways Global Well-Being Index. Available at: https://miratelinc.com/blog/10-happiest-countries-in-the-world-2014/ (Accessed: October 13, 2022).
GBD results. Institute for Health Metrics and Evaluation. Available at: https://vizhub.healthdata.org/gbd-results/ [Accessed September 29, 2022].
OECD Education at a Glance , OECD statistics. Available at: https://stats.oecd.org/Index.aspx?datasetcode=EAG (Accessed: October 4, 2022).
International Classification of Diseases (ICD) (no date) World Health Organization. World Health Organization. Available at: https://www.who.int/classifications/classification-of-diseases (Accessed: October 4, 2022).
Open Sourcing Mental Illness, L.T.D. (2016) Mental Health in Tech Survey, Kaggle. OSMH. Available at: https://www.kaggle.com/datasets/osmi/mental-health-in-tech-survey (Accessed: October 15, 2022).